Multistream Diarization Fusion Using the Minimum Variance Bayesian Information Criterion
نویسندگان
چکیده
Speaker diarization is necessary with ubiquitous and individualized recorders. We focus on the specific task of speaker diarization from two information streams, two microphones, assigned to two participants of interest. In real scenarios, speakers may be co-located, in noisy environments with interfering speakers. Multistream diarization can exploit additional information and diarization fusion is necessary. In this work we first introduce a new database that realistically simulates a range of extremely challenging acoustic conditions; and propose a Minimum Variance of BIC (MVBIC) method to combine information from the various diarization streams. We use a 2microphone subset of our proposed database and Root Mean Square Energy (RMSE) and Mel Frequency Cepstral Coefficients (MFCC) as our two diarization streams to validate the proposed method. We show that our proposed method exploits the complementarity of the individual diarization streams and outperforms static fusion mixing weights. We also demonstrate the robustness of the MVBIC method on RT-06S data.
منابع مشابه
Speaker Diarization Using Gaussian Mixture Turns and Segment Matching
Speaker diarization aims to detect “who spoke when” in large audio segments. It is an important task in processing of broadcast news audio, making easier the audio segments selection and indexing task. In this paper an unsupervised speaker diarization scheme is proposed using a Gaussian Mixture Model as a Universal Background Model, Bayesian Information Criterion and fingerprint detection. A de...
متن کاملImproving Speaker Diarization
This paper describes the LIMSI speaker diarization system used in the RT-04F evaluation. The RT-04F system builds upon the LIMSI baseline data partitioner, which is used in the broadcast news transcription system. This partitioner provides a high cluster purity but has a tendency to split the data from a speaker into several clusters when there is a large quantity of data for the speaker. In th...
متن کاملSpeaker Diarization: From Broadcast News to Lectures
This paper presents the LIMSI speaker diarization system for lecture data, in the framework of the Rich Transcription 2006 Spring (RT-06S) meeting recognition evaluation. This system builds upon the baseline diarization system designed for broadcast news data. The baseline system combines agglomerative clustering based on Bayesian information criterion with a second clustering using state-of-th...
متن کاملAn improved speaker diarization system
This paper describes an automatic speaker diarization system for natural, multi-speaker meeting conversations. Only one central microphone is used to record the meeting. The new system is robust to different acoustic environments it requires neither pre-training models nor development sets to initialize the parameters. The new system determines the model complexity automatically. It adapts the ...
متن کاملInvestigating Various Diarization Algorithms for Speaker in the Wild (SITW) Speaker Recognition Challenge
Collecting training data for real-world text-independent speaker recognition is challenging. In practice, utterances for a specific speaker are often mixed with many other acoustic signals. To guarantee the recognition performance, the segments spoken by target speakers should be precisely picked out. An automatic detection could be developed to reduce the cost of expensive human hand-made anno...
متن کامل